Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?
نویسندگان
چکیده
Deep neural networks (DNN) are currently very successful for acoustic modeling in ASR systems. One of the main challenges with DNNs is unsupervised speaker adaptation from an initial speaker clustering, because DNNs have a very large number of parameters. Recently, a method has been proposed to adapt DNNs to speakers by combining speaker-specific information (in the form of i-vectors computed at the speaker-cluster level) with fMLLR-transformed acoustic features. In this paper we try to gain insight on what kind of adaptation is performed on DNNs when stacking i-vectors with acoustic features and what information exactly is carried by i-vectors. We observe on REPERE corpus that DNNs trained on i-vector features concatenated with fMLLR-transformed acoustic features lead to a gain of 0.7 points. The experiments shows that using ivector stacking in DNN acoustic models is not only performing speaker adaptation, but also adaptation to acoustic conditions.
منابع مشابه
Robust i-vector based adaptation of DNN acoustic model for speech recognition
In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based ivectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we ...
متن کاملSpeaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors
The paper presents a mechanism to perform speaker adaptation in speech synthesis based on deep neural networks (DNNs). The mechanism extracts speaker identification vectors, socalled d-vectors, from the training speakers and uses them jointly with the linguistic features to train a multi-speaker DNNbased text-to-speech synthesizer (DNN-TTS). The d-vectors are derived by applying principal compo...
متن کاملInvestigating factor analysis features for deep neural networks in noisy speech recognition
The problem of speaker and channel adaptation in deep neural network (DNN) based automatic speech recognition (ASR) systems is of substantial interest in advancing the performance of these systems. Recently, the speaker identity vectors (i-vectors) have shown improvements for ASR systems in matched conditions. In this paper, we propose the application of the general factor analysis framework fo...
متن کاملIncorporating Context Information into Deep Neural Network Acoustic Models
The introduction of deep neural networks (DNNs) has advanced the performance of automatic speech recognition (ASR) tremendously. On a wide range of ASR tasks, DNN models show superior performance than the traditional Gaussian mixture models (GMMs). Although making significant advances, DNN models still suffer from data scarcity, speaker mismatch and environment variability. This thesis resolves...
متن کاملContext Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models
Using auxiliary input features has been seen as one of the most effective ways to adapt deep neural network (DNN)-based acoustic models to speaker or environment. However, this approach has several limitations. It only performs compensation of the bias term of the hidden layer and therefore does not fully exploit the network capabilities. Moreover, it may not be well suited for certain types of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014